Fast and versatile algorithm for nearest neighbor search based on a lower bound tree

نویسندگان

  • Yong-Sheng Chen
  • Yi-Ping Hung
  • Ting-Fang Yen
  • Chiou-Shann Fuh
چکیده

In this paper, we present a fast and versatile algorithm which can rapidly perform a variety of nearest neighbor searches. Efficiency improvement is achieved by utilizing the distance lower bound to avoid the calculation of the distance itself if the lower bound is already larger than the global minimum distance. At the preprocessing stage, the proposed algorithm constructs a lower bound tree (LB-tree) by agglomeratively clustering all the sample points to be searched. Given a query point, the lower bound of its distance to each sample point can be calculated by using the internal node of the LB-tree. To reduce the amount of lower bounds actually calculated, the winner-update search strategy is used for traversing the tree. For further efficiency improvement, data transformation can be applied to the sample and the query points. In addition to finding the nearest neighbor, the proposed algorithm can also (i) provide the k-nearest neighbors progressively; (ii) find the nearest neighbors within a specified distance threshold; and (iii) identify neighbors whose distances to the query are sufficiently close to the minimum distance of the nearest neighbor. Our experiments have shown that the proposed algorithm can save substantial computation, particularly when the distance of the query point to its nearest neighbor is relatively small compared with its distance to most other samples (which is the case for many object recognition problems).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree

This paper presents a novel algorithm for fast nearest neighbor search. At the preprocessing stage, the proposed algorithm constructs a lower bound tree by agglomeratively clustering the sample points in the database. Calculation of the distance between the query and the sample points can be avoided if the lower bound of the distance is already larger than the minimum distance. The search proce...

متن کامل

A fast algorithm for a k-NN classifier based on the branch and bound method and computational quantity estimation

Nearest neighbor rule or k-nearest neighbor rule is a technique of nonparametric pattern recognition. Its algorithm is simple and error is smaller than twice the Bayes error if there are enough training samples. However, it requires enormous computational quantities that is proportional to the number of samples and the number of dimensions of feature vector. In this paper, a fast algorithm for ...

متن کامل

A fast nearest neighbor search algorithm based on vector quantization

In this article, we propose a new fast nearest neighbor search algorithm, based on vector quantization. Like many other branch and bound search algorithms [1, 10], a preprocessing recursively partitions the data set into disjointed subsets until the number of points in each part is small enough. In doing so, a search-tree data structure is built. This preliminary recursive data-set partition is...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2007